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Abstract 

The purpose of this article is to develop the dimension reduction techniques in panel 
data analysis when the number of individuals and indicators is large. We use Principal 
Component Analysis (PCA) method to represent large number of indicators by mi- 
nority common factors in the factor models. We propose the Dynamic Mixed Double 
Factor Model (DMDFM for short) to reflect cross section and time series correlation 
with interactive factor structure. DMDFM not only reduce the dimension of indicators 
but also consider the time series and cross section mixed effect. Different from other 
models, mixed factor model have two styles of common factors. The regressors factors 
reflect common trend and reduce the dimension, error components factors reflect differ- 
ence and weak correlation of individuals. The results of Monte Carlo simulation show 
that Generalized Method of Moments (GMM) estimators have good unbiasedness and 
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consistency. Simulation also shows that the DMDFM can improve prediction power of 
the models effectively. 

Key words: Panel data; Dynamic Mixed Double Factor Model; Identification; GMM 
estimation; Cross-section and time series correlation 
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1 Introduction 



Processing of large scale data sets of macroeconomic has been one of the cumbersome 
problems in panel data analysis. Compared with micro panel data, macro panel data 
have more indicators which usually correlate each other. Panel data include cross 
section and time series data, so the cross correlation results from two aspects: periods 
and individuals dependency. If these dependency exist, regardless of where they results 
from, panel data model should be reflected, e.g., comparing economic development 
situation across countries or regions. If every country or region have been regarded 
as an individual and have been observed by continuous time, they have cross section 
and time series correlation because some of items have same economic structure and 
common trends. Similarly, micro panel data have analogical issue, e.g., assets allocation 
and portfolio management in stock market focus on the business industry and security 
market volatility simultaneously, which also can be seen as cross section and time series 
correlation, among others. On the other hand, these large dimensional panel data sets 
don't need also to study correlation between variables and individuals, but also need 
to consider reducing the number of indicators, the later usually be known as dimension 
reduction techniques. 

Factor model have been utilized to analysis large scale macroeconomic data sets long 
time. These macro data sets have hundreds of indicators and some common trends 
owing to co-movements of variables, where common trends reflect correlation of cross 
section. Chamberlain and Rothschild (1983) utilize approximate factor structure to 
study risk free arbitrage portfolio for large scale assets when they have weak correlation 
within each other. They obtain the same results as Ross (1976) arbitrage pricing theory. 
Forni et al. (2000) propose the method of identification and estimation in Generalized 
Dynamic Factor Model (GDFM). GDFM is the factor model which includes the lag 
term of factors and cross-correlation of idiosyncratic components. 
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The inner structure of factor model can take account of error component and re- 
gressors respectively. Some factor models only consider factor decomposition of error 
component, see, Aim, Lee and Schmidt (2001), Moon and Perron (2004), Fan, Fan 
and Lv (2005), Bai (2009), among many others. They discuss unobservable interac- 
tive effects of individuals and periods in error components and regard heterogeneity 
error correlate with regressors, which extract error components through factor decom- 
position. The others consider factor decomposition with regressors, see Forni et al. 
(2000), Stock and Watson (2002), Bai (2003), Anderson and Deistler (2008), etc. In 
this case, the regressors arc expressed by two unobservable orthogonal components. 
Common shocks expressed by minority common factors which utilize to dimension re- 
duction. Idiosyncratic components expressed by factor loadings to reflect the difference 
of individuals. Furthermore, others consider factor decomposition with both of error 
component and regressors, see, Andrews (2005), Pesaran (2006), among others. They 
discuss the multifactor error structure and cross section dependence of individual due 
to the common shock effects. 

The lag effects of general dynamic factor model are the lag terms of common factors, 
i.e. AR or MA processes about common factors. These processes can reflect persis- 
tence effect on individual across periods. VAR processes of dynamic factor model also 
based on lag terms of common factors (eg. Stock and Watson (2005), etc.). Dynamics 
of common factors derive from regressors' lag effects. Dependent variable in statistical 
model can be estimated by regressors. Current and past values of regressors will in- 
fluence dependent variable if we introduce the lag terms of regressors into the model. 
In the application of real problem, lag terms of dependent variable also can influence 
current variable values. Stock and Watson (2002) utilize lag effects of dependent vari- 
able to forecast macro economy, but they transfer them to lag terms of common factors 
without considering time series correlation of regressors finally. 
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Panel data modeling doesn't consider only time series but also cross section cor- 
relation. We propose a Mixed Double Factor Model (MDFM) in this article. The 
double factors refer to factor decomposition with regressors and error components re- 
spectively. MDFM consider structure feature of panel data with respect to time and 
individual. We introduce the common factors and factor loadings of regressors and 
error components to reflect cross section correlation, lag terms of dependent variable 
seem as endogenous variables to reflect time series correlation. The Mixed Double 
Factor Model include lag effects of dependent variable which is called dynamic Mixed 
Double Factor Model (DMDFM). 

Different from time series and cross section data, panel data include three dimension: 
individuals; periods; variables. We consider the case of short panel data in this article 
at first, where the number of individuals is larger than periods length T. Of course, 
we will relax this condition at the end of this article. Simultaneously, the number 
of observable variable p can be larger than N and T. Classic statistical modeling 
methods face multi-coUinearity problem. Wc decompose factors of regressors with 
principal component analysis (PCA) method. With minority common factors (factor 
scores) representing large number of explanatory variables, we reduce the number of 
indicators and parameters to be estimated. On the other hand, common factors reflect 
correlation among variables. 

DMDFM include lag terms of dependent variable in the right hand side (RHS), they 
have correlate with common factors of regressors and error component. So, we use 
generalized method of moments (GMM) to estimate the model. Arellano and Bover 
(1995) study the linear moments conditions and choose the optimal weighting matrix in 
GMM estimation of dynamic panel data. DMDFM have more complicated structure 
than classic dynamic pane data model because they include double factors. In this 
case. The choice of optimal instrumental variables are very important. We divide 
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the estimation processes of DMDFM into two steps. Firstly, we obtain idiosyncratic 
component correlated with regressors via GMM estimation, then utihze PCA method to 
decompose them and substitute the results of factor decomposition into origin model. 
Secondly, we apply difference transformations to the model then estimate the new 
model with error factors by GMM. By two step iterative method we acquire the uniform 
optimal estimators. The results of two step estimation can be used to predict the future 
values of dependent variable. 

The rest of this article is organized as follows. Section 2 will give some notation and 
the construction processes of DMDFM. Specification and assumptions of DMDFM will 
be given in section 3. Section 4 discuss two important problems in DFDFM, one is 
choice method of factors number, the other is the choice of estimation method. Simu- 
lation results will be given in section 5, in which we will simulate the data generation 
processes of DMDFM. Some conclusion and remarks are provided in section 6. 

2 Panel data dynamic Mixed Double Factor Model 
2.1 Panel data factor model 

In panel data model, let Xit and Yu denote the observed value of regressors and re- 
sponsor on the tih. period across the ith individual, • • t—1,- ■ • ,T. is a p 
dimensional column vector, p is the number of regressors. Hsiao (2003) consider the 
following model, which slope coefficients are constant and interception term varies over 
individuals and time: 

p 

k=l 

If the interception terms of above model are regarded as covariances, then the model 
can be rewritten as matrix form: 

Yu^X',,B + uu (1) 
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where B is pxl vector to be estimated, Un is random error term. 

Pesaran (2006) propose estimation and inference of linear heterogeneous panel data 
through multi-factors error structure model: 

Yit = A-Dt + X-^B + Uit 

where the error term become multi-factors error structure: 

Uit = GtV'i + Sit (2) 

where, Gt is unobservable common effects, Cit is individual idiosyncratic error. If Gt 
correlate with Xit, Xit can be expressed as linear combination of Gt, which named 
common correlated effect (CCE). Bai (2009) consider special case when the number of 
individuals N and periods T are very large, factor loadings and common factors are 
regarded as unobservable parameter of interactive fixed effects model: 

Yit^X-,B + Gtr', + Vit 

he previous model's identification, consistency, limiting distribution of the estimators 
have been discussed. 

In the case of high dimensional panel data analysis, for reducing individual data 
dimension and reflecting panel data dependent structure feature among individuals, 
Bai (2003) transfer the regressors of model (1) by common factors: 

Xit = FtlV^ + eu (3) 

in which, Aj are factor loadings. Ft are common factors vector, if the number of common 

factors is r, then r common factors can be written as: F^A^ = KiFu -\ h XirFrt- 

is idiosyncratic error; here, Aj, Ft and e^t all unobservable. Model (1) is rewritten as: 

Yit = FtB* + < (4) 

where u*^^ is unobservable idiosyncratic error, and uncorrelated with Ft. 
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2.2 Panel data dynamic Mixed Double Factor Model 

Time series and cross section correlation may exist simultaneously, and also exist among 
the indicator, so that we consider correlation both in regressors and lag terms of de- 
pendent variable when modeling panel data factor models. Stock and Watson (2002, 
2005) discussed specification and estimation of multivariate time series dynamic factor 
model. They use it to extrapolate prediction in multivariate time series case, but do 
not extend it to panel data model. Meanwhile, panel data model's idiosyncratic error 
component Uu may exist unobservable interactive effects. Considering these factors 
simultaneously, we propose AR(1) dynamic Mixed Double Factor Model with panel 
data as follows: 



where Yn is dependent variable, represent observed value on tth period across ith 
individual; Yi^ is a column vector composed of the lag terms of Y^, w = t — 1, ■ ■ ■ ,t — h; 
pL and pF are h x 1 and r x 1 parameters vector to be estimated, is unobservable 
1 X r common factors vector. Regressors can be decompose to: 



A is p X r factor loadings matrix, different from equation (3), here r common factors 
decompose from p regressors (r < p), but equation (3) decompose from N individuals. 
Another group common factors and correspondent factor loadings Fj are unobserv- 
able 1 X s vector, obtained from regression equation: 



Yu = Yi^i^L + FuI5f + GtT^ + eu 



(5) 



(6) 



(7) 



Next, decompose factors from idiosyncratic error uu as equation (2). i.e. 



Uit^Yu-Y,^^L-FitpF^GtV\ + eit {i^l,--- ,N) 
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where s common factors and corresponding factor loadings can be written as: GtT[ — 
li\Git H h nfisGst- 

With matrix notation, omit subscript of individuals and periods, rewrite equation 
(7) as simplified style: 

Y = Yl/3l + F/3f + Gr' + e (8) 

where Y and Y^ isT x N and T x N x h matrix respectively; Fisa,TxNxr matrix 
with r indicator; G and F is T x s and N x s matrix respectively; /3l and /3f is h x 1 
and r X 1 coefficient vector. 

From model (8) we consider panel data models with interactive effect in time series 
and cross section dimension. In this model, lag terms Yl reflect time series correlation. 
Without loss of generality, we only consider AR(1) model below. In fact, high order 
autoregressive model have similar analysis with AR(1). This article propose modeling 
panel data factor when the number of indicators p is very large. First group factor 
F aim to reduce the dimension of regressors indicator and multi-coUinearity among 
indicators. Second group factor G reflect interactive effect of error component. After 
twice factorization, idiosyncratic error component e satisfy model assumption. 

Model (8) is the generalization of many previous approximate factor model. Interac- 
tive fixed effect model proposed by Bai (2009) consider interactive effect in heterogene- 
ity error term. If we regard the first factor decomposition as identical transformation 
for regressors without considering lag effect, then the DMDFM become the interactive 
fixed effect model. If we only decompose factor to regressors, DMDFM become classic 
factor model. 

Compared with Pesaran (2006) multi-error structure model, DMDFM consider both 
individual effect of regressors and lag effect for dependent variable. In the processes 
of factor decomposition, if we decompose common factor F and G with same method, 
DMDFM become multi error structure model. 
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Anderews (2005) propose the common shocks of cross section regression generalized 
classic common factor model, but the model only discuss common shocks to cross sec- 
tion without giving specific form of common factors. If we regard factor decomposition 
of DMDFM as common shocks, the same conclusion as Andrews should be obtained. 

The forecasting idea of DMDFM is slightly different from Stock and Watson (2002), 
we introduce double style factors to reflect time and individual correlation. DMDFM 
generalized style of Stock and Watson from multivariate time series to panel data, the 
more complex factors will be considered. 

3 Identification and assumption of DMDFM 

Generally, we assume that the number of individual N and periods length T are very 
large when we investigate high dimensional panel data. We pay more attention to large 

and p, where the dimension of individuals and indicators is very large. The relative 
size of A^ and p aren't restricted strictly. 

The problem of parameters estimation and variable identification derived from not 
enough restriction condition, in this case the values are not solely. To factor model, the 
problem of proper identification and estimation include more assumptions than classic 
panel data model. We apply some assumption condition to factors and factor loadings, 
the constraint condition also apply to error term, regressors and model (2) (5) (6) (7). 

Assumption A: (Identification) 

al. A'A/p Ir. 

a2. E(FF') = 'SpF'i where E^^' is order r positive diagonal matrix; the subscript 
of Fit is omitted for simplicity. 

a3. r'r/N Is. 

a4. E{GtG'^ = Sgg,/, where '^qq' is order s positive definite diagonal matrix. 
Because F^^A' = FuRR-^K and Gtr\ = GtQQ~%, where R and Q can be arbitrary 
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invertible matrix order r and s. If we do not apply some constraint conditions to them, 
decomposition factor of regressors and error terms wouldn't be solely. Assumption al 
and a2 can obtain restriction for first group common factors F^t and factor loadings 
A. Assumption a3 and a4 can obtain restriction for second group common factors 
Gt and factor loadings F,. Stock and Watson (2002) argue that assumption a2 and 
a4 can ensure covariance stationary if we introduce lag terms of common factors 
and Gt in dynamic factor model (5). Bai (2009) propose some invertible assumption 
of coefficient matrix to identification and estimation parameter and f^p- 
Assumption B: (Factors and factor loadings) 

bl. ||Ai|| < Xmax < oo. 

b2. -BllFll^ < oo, and P'^Y^^FF' A- S^^/, the subscript of F^ is omitted for 
simplicity. 

b3. ||7i|| < ^rnax < OO, < OO. 

Frobenius norm of matrix F is defined as ||F|| = [ir(F'F)]^/^, where tr{F) is the 
trace of matrix. Assumption bl-b3 can assure common factors Fu and Gt with corre- 
spondence factor loadings are not infinity. Bai and Ng (2002) argue that above factors 
and factor loadings can ensure factor model standardized and improve the efficiency of 
factor decomposition to primitive variable. 

Assumption C: (Errors component) 

cl. E{eit) = 0, Varisit) = cr^ E{YitYit+h) = Pi{h), 

limN^ooSUptY^N WPiWW < M <oo. 

c2. E{YuYjt)^rtik). 

c3. For every(t,s), E{N-^ J2i kisQt - ^(Q.Qt)H < M < oo 

c4. limN^ooSupiJ2i,jY.s,t,u,v \\cov{eiseit,ejuejv)\\ < M < oo 

Assumption of error term and its moments come from three parts: mean, variance, 
moments condition, they are also called weak correlation assumption. Assumption cl 
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restrict weak correlation of time series and mean of error term ruled out by twice factor 
decomposition, where the weak correlation is ready to the follow discussion of dynamic 
factor model. Assumption c2 represent cross section correlation. Assumption c3 give 
high order moments condition with uniform bound. Assumption c4 is covariance bound 
of TS/CS, which is more stricter than cl-c3. 

The idiosyncratic error eu and from rcgressors X^^ and error term Uu must satisfy 
assumption of factor decomposition, i.e., idiosyncratic errors are mutually independent, 
mean 0, diagonal covariance matrix with off-diagonal elements 0. 

Assumption D: (Dependent variable, common factors and model parameters) 

dl. E{Y;^Gt)^^,E{G',Fu)^^l^. 

d2. E[Gteu{Gteu)']^T-^j:sJ:t(GtG[ei,eu) (iff t ^ oo), 

E[GtFu{GtFu)'] = J:fg, E[{Y;^Fu)'y;^Fu] = ^yf, where J^fg and Sy^ are 
block diagonal positive matrix. 
d3. II^lII < oo,\\/3f\\ < oo. 

Assumption D impose on the relationship between regressors and error term, which 
propose key condition being utilized to parameters estimation. Assumption dl reflect 
correlation of regressors in model (5), assumption C have given some weak correlation 
in the other variables. Assumption d2 is very strong which ensure model (5) can be 
estimated. Assumption dS restrict the bound of (3l and (3f- 

Assumption A-D describe inner structure of model (2)-(7), and guarantee every 
model can be estimated. We will study how to estimate the model and discuss the 
asymptotic property of estimator under large and large p. 
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4 Model estimation 

4.1 Factor decomposition and choose the number of factors 

DMDFM decompose factors twice, so that the method of factor decomposition and 
the choice of factor number are very important. Many hteratures have discussed the 
choice of lag orders and the number of factors, but the schemes they proposed are only 
adaptive to lag of factor, e.g., followed by Forni et al.(2000) generahzed dynamic factor 
model (GDFM), Hallin and Liska (2007) propose a valid information to choose the 
number of common factors, their method is based on spectral density matrix decom- 
position theory. Harding and Nair (2009) exploit random matrix theory and Stieltjes 
transformation derive uniform estimation processes to determine lag orders and the 
number of common factors for common shocks component. This is named dynamic 

scree plot method, where the GDFM is conveyed as follows: 

1 

i=0 

where Rt is N x 1 vector, the dynamic refer to lag effect of factors, which is different 
from the dynamic model of dependent variable in this article. 

We decompose factors twice in this paper, firstly we utilize equation (2) to handle 
weak correlation and reduce dimension of individual, where common factor composed 
of common shocks by different individual. Secondly, we use classic PCA method to de- 
compose factor in equation (2) and (6). We will apply two different methods to choose 
the number of factors . Choose the factor number of regressors in model (6) can use 
nonparametric scree plot method because the common factors of model (6) extracted 
from large indicators as multivariate analysis, in which factors number determined by 
scree plot method through contribution rate of variance can reflect indicator informa- 
tion maximize. 

Remark 1 : We decompose factors on every periods, and obtain different factor num- 
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bers vary with periods. It's very important to choose a unify number of factors, which 
can improve analysis efficiency. Here, we choose the maximum contribution rate of 
variance to determine the number of common factors. 



To determine the factors number of idiosyncratic error uu is more comphcated be- 
cause they are additional information after several times transformation. Bai and Ng 
(2002) propose two styles choice strategies of number factors for panel data, they all 
derived from Mallows (1973) information criterion (Cp). 

One of them is panel data Cp criteria {PCp), it has three styles, the basic case is: 



P,iV + T,, , NT 



PCp,{k) = V{k, F,) + ka\^^)ln{j^) 

where V{k^ F^) — Y^^=i ~ ^'i^if^- P^p consider minimizing criteria 

with square sum of error plus a penalty function. PCp2 and PCpz are similar with 

PC pi. 

The other one is panel information criteria {ICp), correspondence with PCp, they 
also have three styles, one of them is: 

N + T NT 

iCpi{k) = v{K F,) + K^^)i<^^) 

The advantage of this criteria is that it doesn't depend on square error af, this may 
extend the application scope. Bai and Ng (2002) argue that both ICp criteria and PCp 
criteria can choose the number of factors in panel data analysis. 

PCp and ICp information criteria all can be used to factors number choice for panel 
data. DMDFM decompose factors twice. Equation (6) is a multivariate PCA decompo- 
sition, however equation (2) is a panel data factors decomposition of error component. 
In the processes of idiosyncratic error Uit decomposition, the choice of factors num- 
ber utilize PCp and ICp minimization criteria. The regressors' factors number will be 
chosen by variance contribution method or scree plot method. 
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4.2 Estimation processes of DMDFM 

The estimation processes of DMDFM (2)-(7) can be divided into following four steps: 
Firstly, decompose factors with regressors Xu] Secondly, estimate model (7); Thirdly, 
decompose factors with error term Uit] At last, estimate model (5). The two step 
estimation and two step factors decomposition are different with their realized processes 
respectively. 

At first, we reduce the dimension of multiple indicators of regressors from p 
to r (r < p), where the number of factors r is determined by the rate of variance 
contribution. The results can be expressed as: 

Xit^FitK + eu (9) 

Remark 2: Common factors Fa and factor loadings A are unobservable, and the 
information of regressors Xit are reflected through common factors Fa. Here, we use 
factor scores in equation estimation rather than common factors. Factor scores can be 
obtained by weighting least square or other methods. 

Next, substitute Fa and Y^t 's lag terms into model (7), use Generalized Method 
of Moments (GMM) to obtain models' initial parameter estimators jii and jip- Fur- 
thermore, calculate the error of model (7) from the results of GMM estimation: 

Uit = Yit - Yit = Yit - Yiy,pL - FitpF 

Then, decompose factor with uu, use PCp and ICp criteria to determine the number 
of common factors s. The results of decomposition can be written as: 

Uu^GtVl + eu (10) 

Finally, substitute the results of twice factor decomposition into model (5), and 
estimate model (5), obtain the estimation parameters and and the prediction 
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equation: 

Yit^YijL + Fjp + Gtr- (11) 

When estimate model (5), we can get T'T/N — Ij. through assumption a3 and a4, 
which provide the identification condition of common factors and factor loadings. At 
the same time, equation (10) provide the result of decomposition for common factors 
Gt and factor loadings Fj, so GtT\ in equation (11) can be observable. We consider 
the correlation among lag terms and regressors when we estimate model (5). Thus, we 
utilize GMM to estimate the parameters of model (11). 

Above four step estimation method include two step factor decomposition and two 
step model estimation. First step factor decomposition realize to reduce dimension 
of indicators, typical factors and their scores represent all covariates and their values. 
Second step factor decomposition mainly reflect idiosyncratic and interactive effects of 
individuals and periods. Two step estimation procedure provided in the model, first 
step extract idiosyncratic errors to decompose factor of interactive effect. Second step 
obtains consistency estimator of model (5). In which the choice of correct estimation 
methods of given model is very important, if not we will get incorrect estimation result. 
Here, we consider applying generalized moments method (GMM). 

4.3 Realization of estimation processes 

Model (5) include lag term of common factors and dependent variable,therefore max- 
imum likelihood estimation method is difficult to get strong uniform convergence re- 
sults. Arellano and Bond (1991) consider GMM estimation for individual random effect 
panel data autoregressive model with independent strict exogenous variables and pre- 
determined variables. Arellano and Bover (1995) develop the method of instrumental 
variable selection through GMM estimation in panel data model which include prede- 
termined variable, they characterize the valid transformations for exogenous variables. 
GMM is more fiexible for the panel data model estimates with lags and exogenous 
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variables, and it also can be regarded as a consistency parameter estimation method 
for DMDFM. 

We need determine moment condition and choose optimal instrumental variable if 
we utilize GMM to estimate panel data DMDFM. Without loss of generality, we only 
discuss the AR(1) process of dependent variable below. Here, model (5) can be written 
as: 

Yu = Yit-ip + Fu^F + GtT\ + eu (12) 

Because common factor Gt and factor loading Fj obtained from decomposition of equa- 
tion (10), Gt and Fj are observable when estimate model (12), denoted by Gt and Fj, 
and model (12) become 

Yu = Y,t-ip + Fitl^F + GtV\ + eu (13) 

For simplicity, we still use notation represent error component in model (13). Fol- 
low the inspiration of Arellano and Bond (1991), Hsiao (2003), instrumental variables 
maybe choose lag terms of dependent variable (predetermined variable) and exogenous 
variables. For the model (13), the choice of instrumental variables should correlate 
with explanatory variable and be orthogonal with the residual terms. So, implement 
first order difference transformation to the model (13), we obtain 

Yu - Yu-i = {Y,t-i - Yu-2)p + {Fit - Fu-i)Pf + (Gt - Gli)r', + - eu-i 

Here, [Gt — Gt-i)V\ is observable scalar variable, it can be combined with constant 
term when we estimate model (12), as well as add a constant term into model (12). 

Remark 3: We assume factor decomposition of error component can be substituted 
into constant terms, so they can be regarded as constant factor amongst common 
factors Fit. If this, we should replace Fit with new notation. For the sake of brevity, we 
still use the same notation as before, but the factorization results of error components 
are regarded including in the error terms of model (13). 

17 



The first order difference transformation of model (13) can be written as 



Yit - Yit~x = ^it-x - yit-2)p + {Fit - Fu-i)Pf + eu - eu-i 



rewritten as difference operator A 



(14) 



the lag terms of Y^t, Yit-2-j (j = 0, 1, 2, • • • , t-2) subject to E[Yit-j-2{yit-i-yit-2)] 4" 
and £'[yjt_j_2(eit— ejt_i)] = 0. For the ith individual, which includes T{T—l)/2 moment 
conditions. The difference of the error term, [eu — eit_i), t — 2, • • • , T, are denoted by 
Aej, here r explanatory variables F^ have similar features with l^j_2_j, 

£;[F,tAe,] = 0,t = l,-- - ,T 

Thus, wc obtain r x T x (T — 1) moment conditions for ith individual, predetermined 
variables and exogenous variables can determine T{T — l)/2 + rxTx (T — 1) moment 
equations of residual term. Denotes 

Hit — iXiQi ■ ■ ■ 1 ' " 1 Fir) 

the T{T — l)/2 + rxT'x (T — 1) moment equations can be written as: 

E[Hitl^eit]^Q,t^2,--- ,T 

these moment equations provide some moment conditions to error terms. For simplicity, 
omit the subscript t for all variables, and obtain matrix form for the model: 



Ay, = Ay,_ip + i^Fi^F + Aq, i = 1, • • • , TV 



(15) 



Denotes 



Zi^ 



Hi2 ••• 
His ■■■ 



■■■ H, 



iT 
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for the ith individual, the previous moment equations can be written as: 

E[ZiAei]^0,i^l,--- ,N (16) 

Because the number of moment equations in equation (16) is T(T — l)/2+r x T x (T — 1) 
which is much larger than the number of parameter to be estimated in model (15), r+1, 
we impose some restriction conditions on it. The residual sum of squares of model (15) 
define as follows: 

N 

V{AY, AF; p, /3) = 5^(AF, - AF,,_ip - AF,/3^)'(AF, - AF,,_ip - AF,/5^) (17) 
1=1 

We can obtain uniform optimal estimator of unknown parameter through minimizing 
objective function (17). Too many moment conditions lead the moment equations (16) 
no solution. To acquire valid conditions of parameter estimation, we seek some positive 
definite matrix A which transform objective function (17) to be as follows: 

N 

V{AY, AF; p, /3) = ^^(AF, - AYi,_,p - AF^' A{AY, - Ar,,_ip - AF,/3^) (18) 

i=l 

Through minimizing objective function (18), we can obtain estimators p and (3f of 
parameter p and (3f, by choosing appropriate positive definite matrix to minimize 
objective function (18). The covariance matrix of Z^Aci is: 

N 

1=1 

which estimation results can be written as: 

N 
i=l 

from the results of Hansen (1982), optimal alternative Aq of positive definite matrix 
A is V^^. Prom previous assumption C, error term en is i.i.d., mean 0, variance 0"^, so 

we have: 

N 

Ao^{N-'Y.Z,UZ',)-' 

i=l 
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According to the one step estimation method of Arellano and Bond (1991), known 
transformation matrix can't extract the information of error term thoroughly. We con- 
sider using two-step estimation method, utilize the residual ef' of first step estimation 
to construct transformation matrix Ui — ^f^i e^^'^el^^ ■ Then we minimize objective 
function (18) and obtain the estimators of p and similar with Arellano and Bond 
(1991): 

{pJf) = {{AY_,,AF)'z'AoZ{AY_,,AF))-\AY_,,AF)'z'AoZAY (19) 

where AYLi and AF are N(T — 1) vector and N{T — 1) x r matrix respectively, 
which represent predetermined variables and exogenous variables. These two styles 
of variables can be estimated respectively or simultaneously as explanatory variables. 
The meaning of Aq and Z as previous, represent optimal choice of transformation 
matrix and weighted matrix respectively. Z is a block diagonal matrix composed by 
the instrumental variables. 

4.4 Theory results 

GMM estimation solve population moment equations through sample moment condi- 
tions, with regard to the case of over identification, we transform them to just identifi- 
cation by weighted matrix or transformation matrix A. If the optimal weighted matrix 
and the instrumental variable matrix have been correct chosen, the GMM estimators 
satisfy consistency and asymptotic normality. The sample estimators of parameter of 
equation (19) obtained through minimizing objective function (18) can be written as: 

(pJf) = |[ 5;;(Ay,_i,AF,)'z:] [j2ZiAoZ'^~'] J2Zim,-i,^F,)]] ' 

x[^((AF,_i,AF,)'z:] \Y.Z,AoZ'^^\ J^^.AF,,] (20) 

i i i 

RHS of model (5) include the high order lag terms of dependent variable Y^, which be 
seem as IV in GMM estimation to obtain consistent efficiency estimators of regression 
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parameter. Under previous assumption conditions are satisfied, we could draw more 
general conclusion as below. 

Theorem 1. (Consistency) Under assumption conditions A-D satisfied, GMM esti- 
mators I3l and I3f are the estimation results of lag terms parameter (3l and common 
factor parameter (3f respectively. Suppose the number of explanatory variables p and 
period length T are given, when N ^ oo, following conclusions are found: 

(lA - /3l 0, i3f-Pf^0 

(2)YijL + FuPf + GtT'i - {Yi^^L + Fu^F + GtV',) ^ 
The proofs of theorem 1 can be found in Appendix A. 

The conclusion (1) of theorem 1 indicates that the coefficient estimators of predeter- 
mined variables and exogenous explanatory variables converge w.p.l. to real parameter 
as sample size tends to oo. Conclusion (2) demonstrates consistency estimation of full 
model, more detail see the proof processes, where the expression of estimators can be 
analogize from equation (19) and (20). The estimation results of model parameter have 
consistency, so can apply to extrapolation and prediction. 

Suppose random error term en is i.i.d., and mean 0, variance o"^ normal distribution, 
where optimal transformation matrix Aq and weighted matrix Z are chosen in GMM 
estimation, we obtain 



avar 



{p, ^f) = ^^^)'^^] [ E ^^^o^:] [ E Z^{6Y,,.„ AF,) 



-1 



(21) 



which is the asymptotic variance of estimator. 
Rewritten objective function (18): 

N 



On = J2^AYi - AFi,_i/3L - AF,/3^)'^(AFi - AY^^.^^l - AF.^f) (22) 



i=l 



Solving first order partial derivative to objective function Ojv with respect to parameter 
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and 

RL^dON/dpL and Rp^dON/dpF 

where R{(3l, Pf) — P'f)' ^^^^ order partial derivative with respect to parameter 
to be estimated, because we obtain estimation results of (20) via minimize objective 
function (18), which converge to (19) consistently. Furthermore, consider random 
matrix R converge to matrix Ri w.p.l., denote 

El = {R:,AoRi)-^R!iAoDiAoRi{R!iAoRi)-^ 

where Di is asymptotic variance of y/NO^ when A?" — >■ oo, 

VnOn a N{0, Di) (23) 

here assume VNOn converge in distribution to normal distribution with mean 0. 

Above analysis all base on short panel data (T < N), furthermore, consider long 
panel data which periods length T and individual number N tend to infinity simulta- 
neously, and R R2, let 

a.s. 

N 

On = (NT)-' 5^(Ali - SYi,^iPL - AFM'A{AYi - SY^^^i^l - ^F.^f) 

i=l 

other notation unchange, denote 

E2 = {R2A0R2) ^-R2^0^2^0^2(^2^0^2) ^ 

when N,T ^ 00, assume 

^/NTOn a iV(0, D2) (24) 

Under the given correlation assumption, when periods length T — > 00, GMM esti- 
mators of dynamic double factors model satisfy asymptotic normality. The conclusions 
see theorem 2. 
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Theorem 2. ( CLT) Given some positive matrix Ei -T12, under assumption conditions, 
the conclusion as follows: 

(1) Explanatory variables have serial correlation, dependent variable have cross section 
correlation, N — >■ oo, T is fixed, so T/N — >■ (short panel data), then 

(2) Explanatory variables have serial correlation, dependent variable have not cross 
section correlation, N,T ^ 00, and T/N C , C is constant (long panel data), 
C ^Q, then 

VNf[0LjF) - {/3l,I3f)] 4 N{0, E2). 
The proofs of theorem 2 see Appendix B. 

The conclusions of theorem 2 are asymptotic normahty of sample estimator for short 
panel {T <^ N) and long panel {T and N is close). The value of Ei and E2 correlate 
closely with asymptotic variance Di and D2 of ^/NOn- Optimal Weighted matrix Aq 
generally substituted by a random given matrix to obtain Di and D2, so Di and D2 
mainly dependent on the variance of random error term. Furthermore, assumption 

E{eiteit+h) = 

and Var{eit) — a^, so variance of disturbance term influence asymptotic variance of 
estimator varied with the estimation method of given model. The choice of IV and 
weighted matrix also influence asymptotic variance, If 

E[Aei\Zi] = 0, i = l,---,N 

then the interactive effect of error term and IV aren't considered, this is more stronger 
than E[AeiZi] = 0. 

Obviously, choosing different IV Z also influence asymptotic variance of VNOn, 
furthermore Ei and E2, so different number of IV will get different estimation results. 
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For GMM estimation, appropriate IV come from higher order lag terms and exogenous 
variables, so it is important to choose the order of lag terms. Meanwhile, if every 
estimator of parameter to be estimated have asymptotic normality, by Slutsky's lemma, 
the asymptotic properties of the sum of these estimator will be obtained. 

5 Simulation Study 

DMDFM consider time series correlation and cross section correlation simultaneously. 
To reflect these two styles of correlation, simulation processes permit that common 
factors of error term have lag effects. Common factors being decomposed by explana- 
tory variables have individual correlation as well as series correlation. Factor loadings 
mainly reflect individual correlation. High dimension case include a large number of 
explanatory variables, we attempt to use minority common factors to extract informa- 
tion of explanatory variables to reduce dimension. So, in the processes of simulation, 
these common factor should consider not only correlation with explanatory variables, 
but also lag effects of explanatory variables. Consider the following data generation 
process (DGP): 

Vit = ai + PiiVit-i + Pfifiit + Pf2f2it + liiQit + 1x2921 + (25) 

compared with model (5), DGP add some restriction conditions to reflect realized 
issue, which in terms of five parts: Interception; first order lag of dependent variable; 
common factors of covariates; common factors and factor loadings of error components; 
idiosyncratic errors. As mentioned above, we choose two common factors from each 
group factors. 

Interception terms generated by normal distribution: 

CKi ~ i.i.d.N{l,2) 

To reflect series correlation, the error term of model (5) generated by AR(1) pro- 
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cesses: 

p,^i.i.d.U{Om, 0.95) 
7]it - i.i.d.N{0, 1) 
Q,o = 

this part of errors represent idiosyncratic error generated by factors decomposition. 
Prom the factor decomposition processes of equation (2), the other part of error com- 
ponents reflect in common factors and factor loadings of error term. Assume common 
factors of error component retain lag factors, and expressed them as AR(1) processes 
from different idiosyncratic errors. First order correlation coefficient generated by uni- 
form distribution, two error components DGP can be written as: 

9jt = Pjtgj,t-i + Ujt {j = 1, 2) 

Pjt ~ i.i.d.U{0.05, 0.95), gj^o = 
Ujt ~ i.i.d.N{0, 1) 

where factor loadings of error component always generated by uniform distribution or 
normal distribution, here we use uniform distribution. 

7fca ~ i.i.d.U{Om, 0.95) 

7fcj2 ~ i.i.d.U{Om, 0.95) 

Common factors extracted from explanatory variables should reflect correlation among 
individuals, periods and explanatory variables. Every common factor of different indi- 
viduals retain main information of explanatory variables and idiosyncratic component 
of individuals. So the data generation process of each common factor consists of four 
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parts: level term; error factors term; individual correlation component; error compo- 
nent, which can be generated by: 

fkit — dkilhklt + Ikiigit + Iki2g2t + CkltQ.il + l-^kit 

where level term compose of an individual random coefficient multiplied by an AR(1) 
processes. First order auto-correlation coefficients and initial value of AR(1) processes 
have been given, the others generate from AR(1) processes. Two common factors DGP 
of explanatory variables are: 

auii ~ ii.d.C/(0.05,0.95) 

hklt — Pkhhk,l,t-1 + Tkh 

pih = 0.4, /ii,i,o = 0.2, p2h = 0.5, /i2,i,o = 0.3 
Tkh ~ i.i.d.N{0, 1) 

random error of common factors terms generate from normal distribution: 

Ukit ~ i.i.d.N{0,0.25) 

individual correlation components generate from spatial auto-regression SAR(l), which 

can be generated by: 

Qil = Pq(li-l,l + T^q 

Pq ~ i.i.d.U{0.05, 0.95), qo,i = 0.1 
Uq ~ i.i.d.N{0, 1) 

the coefficients of individual correlation components generate from uniform distribu- 
tion: 

Cut -i.i.(i.t/(0.05, 0.95) 
C2it ~ ii.rf.C/(0.05, 0.95) 
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common factors of explanatory variables retain the common factors of error components 
to express extracted information, whose coefficients generate from uniform distribution: 

Based on above thoughts, the simulation should give an initial value of the explana- 
tory variables yu. yio = 0, and (3ii = 0.6, = 0.8, /3/2 = 1. To ensure the consistent of 
the data generation process, we discarded the first 15 simulation value. Every experi- 
ment was replicated 2000 times for the (N,T)=(20,5), (50,5), (50,10), (100,5), (100,10), 
(100,20), (200,5), (200,10), (200,20), (200,50) respectively. The estimation results of 
parameters and (3f2 derive from 2,000 times replicated simulation, whose mean 

bias and root mean square error (RMSE) are calculated hereafter. The simulations 
results are summarized in Tables 1. 

Table 1. Bias and RMSE of simulation results 



(N,T) 




Bias 






RMSE 




Ai 












(20,5) 


-0.0981 


-0.0333 


-0.0319 


0.01238 


0.01313 


0.01301 


(50,5) 


0.00131 


0.02163 


0.00161 


0.00962 


0.01013 


0.01036 


(50,10) 


0.03613 


0.00423 


0.02274 


0.00377 


0.00655 


0.00647 


(100.5) 


0.01293 


0.02180 


0.01876 


0.00942 


0.00909 


0.00914 


(100,10) 


0.04707 


0.00758 


0.01898 


0.00361 


0.00607 


0.00581 


(100,20) 


0.07718 


0.00616 


0.01829 


0.00189 


0.00459 


0.00417 


(200,5) 


0.03012 


0.02184 


0.02596 


0.00994 


0.00808 


0.00888 


(200,10) 


0.05712 


0.00829 


0.01733 


0.00364 


0.00591 


0.00565 


(200,20) 


0.08152 


0.00812 


0.02530 


0.00187 


0.00432 


0.00425 


(200,50) 


0.08032 


0.01192 


0.02292 


0.00187 


0.00448 


0.00420 



As can be seen from Table 1, under the case of given the values of N and T, first order 
lag term of dependent variable in DMDFM have smaller bias and RMSE as well as 
coefficient estimation value of explanatory variables' common factors. It indicates that 
GMM estimation can obtain consistent and efficient parameter estimator. Furthermore, 
consider the size of relative bias, we can see that the range of dependent variable 
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and explanatory variables are in (-20,20). The results of table 1 is relative smaller 
than initial values, so the estimators are consistent correspondence with population 
parameter. These satisfy the large sample properties of DMDFM and GMM estimation 
have been mentioned previously. 

The results of simulation demonstrate that bias and RMSE of regression coefficient do 
not obviously vary with the number of individuals increment. The number of individu- 
als increase from 20 to 200 and periods increase from 5 to 50, but bias do not increase 
with individual size, which indicates the results of estimation have good properties for 
finite sample. For the short panel, periods is shorter than the number of individual, 
bias and RMSE s do not vary obviously. When periods become longer, according to 
the results of Monte Carlo simulation, the bias of estimator become smaller. In above 
simulation, the number of individuals at least 4 times more than periods length, which 
reflect the high dimensional feature. Bias of estimator become smaller with increasing 
the periods length. The results of estimation have higher uniform convergence speed. 

Root Mean Square Error (RMSE) include the information of sample bias and vari- 
ance. The results of Table 1 demonstrate that bias and RMSE are smaller. Its show 
that the variance is also smaller due to MSE is the sum of variance and square of 
bias. The smaller variance of estimation results indicate that this estimation method 
not only obtain consistency estimator but also obtain efficient variance. This testify 
consistency and efficiency once again. 

DMDFM can reduce the dimension of indicators and reflect the internal structure 
of panel data reasonably, furthermore, the model estimation results can be used for 
predicting dependent variable. In order to test the prediction effect of DMDFM, we 
still utilize the DGP as previous to generate a group training set and testing set. To 
enhance the observability of the graphics, we predict 20 periods values of dependent 
variable step by step. At first generate every periods value of explanatory variables and 
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one period lag value of dependent variable, then predict dependent variable forward one 
period by two step estimation method through model (5)- (7) to compare the predicted 
values and true values. Figure 1 is average predicted value of 100 individual compared 
with true value. Figure 2 is 6 individuals which extracted randomly from 100 individual 
predicted values compared with true values. 

As can be seen from Figure 1 and Figure 2, predicted values of all individual average 
and every individual have good prediction effect via GMM estimation. One step pre- 
dicted value have goodness fitting of trend as well as points. The constructed model 
and its estimation method reflect the data generation processes well, and prediction 
effect is better. Furthermore, if we consider Mean Absolute Percentage Error (MAPE), 

the similar conclusion should be obtained. 

Predicted vs True on Time across Individual 



00 



CO 



5 10 15 20 

Figure 1: Average predicted and true value on 20 periods across 100 individuals 
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Figure 2: Predicted and true value on 20 periods across 6 individuals 

6 Conclusion 

In this article, we propose a panel data double factors model which consider both 
explanatory variables and error component factor decomposition. The Mixed Double 
Factor Model derive from the factor decomposition method, the analysis aim of twice 
decomposition is different. Contrast to the general dynamic factor model, the dynamic 
of DMDFM refer to the lag terms of dependent variable. In theory, if panel data have 
first order auto correlation of time series and heterogeneity components of individual 
or periods (fixed effect of random effect), the lag term l^,t-i of dependent variable 
Yit are determined by the expectation of two parts information: the lag information 
sets If-i of explanatory variables Xit and the remainder information given by X^, 
i.e. E{Yit) = E{Yit\It-i,Xit) = E{Yit\Yit-i, Xu). The dynamic panel data model 
constructed by lag terms of explanatory variables and common factors are different, 
however the results are excellent. Dynamic Mixed Double Factor Model compose of 
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four main parts: lags of responsor; common factor of regressors; factor error component 
and idiosyncratic error. 

RHS of dynamic panel data model include the lag terms of dependent variable, so 
independent assumption of error term and dependent variable aren't satisfied. OLS or 
MLE of dynamic factor model can not get consistent efficient estimators, generalized 
moment method (GMM) is a better alternative options. In this article, we propose 
an iteration GMM to estimate DMDFM. At first, we obtain error component of the 
model through GMM estimation, furthermore decompose factor with the given error 
component. The factors decomposition results of estimated error component can be 
regarded as interception of new model which can be estimated by GMM to obtain 
parameter estimation value once again. The proof of the theory and simulation results 
show that the two-step GMM estimation be able to get consistent estimators of the 
dynamic Mixed Double Factor Model. The estimation results of DMDFM have better 
explanatory power and prediction effects. 

DMDFM reduce the dimension of large number of indicators. In which, a large num- 
ber of explanatory variables be represented by few common factors, this extends the 
application scope of the model. However, every explanatory variable has its own impli- 
cation in empirical analysis, wc should consider how to provide reasonable explanation 
of explanatory variables in next step. The research scope of this article only aims at 
dimensional reduction, variable selection for the explanatory ability of the indicators is 
not considered, which restrict the application effect of the model. Panel data usually 
have serial correlation and cross-section correlation, there perhaps exist other struc- 
tural features. These structural features related to individual are obvious in the spacial 
panel data, i.e. structural change, heteroscedasticity and variance magnitude, and so 
on, however DMDFM can not solve these problem thoroughly. We will study how to 
improve DMDFM to refiect the structural features of panel data in the future. 
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The estimators of DMDFM mainly focus on expectation in this article, however vari- 
ance of DMDFM also should be taken into account as well as multivariate time series 
heteroscedasticity model. The other issues of DMDFM include: consistent asymptotic 
variance estimation; asymptotic efficiency of estimators; testing of estimators obtained 
by GMM estimation, etc. In addition to theory analysis of model construction and esti- 
mation, empirical research also should be considered. Because high dimensional panel 
data appear both in macroeconomic and microeconomic fields, empirical research com- 
bine with application background must be discussed in future. 

Appendix: Proof of Theoretical results 

A. Proof of Theorem 1. 

Denote b{z, /3) = ZiAei, where /3 = P'p)'- From equation (16), we have E[b{z, /?)] = 
0, calculate partial derivative for each parameter to be estimated, db{z, P)/d/3, then let 

DbiPr.,PF) = idb{biz,P)/dPldb{z,P)/dp'p)' 

because the uniform consistency of random disturbance term, using Taylor series ex- 
pansion around and Pf' 

b(z, P) = b(z, P) + DbiPl, p*p)(b{z, p) - b(z, P)) + o{b{z, P)) (Al) 

where P — {P'l: P'f)\ Pl^ between Pl, Pl, and pp, Pf respectively, multiplied 

weighting matrix A simultaneously: 

Ab{z, P) = Ab{z, P) + ADb{Pl, P*pMz, P) - b{z, P)) + o{b{z, P)) {A.2) 

by the following three items: 

(i) From assumption as before, given optimal weighting matrix Aq can obtain unique 
optimal estimator of ^. ^ is continuous vector definite on Euclid space i?", space © 
constituted by /3 is a subset of i?", and is closed and bounded. 
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(ii) For b{z,P) = Z^Ae^, Ve > 0, from (A.l) 

E{b{zJ))^b{z,P) 

so, 

\b{zj)-b{z,^)\^0 

for given matrix A, denote 

SN(P)^b{zJ)'Ab{zJ) 

and 

So(P)^b(z,^)'Ab(z,^) 

from (A. 3), <S'o(/3) is continuous. 

(iii) Next, prove So{/3) convergence with probability 1. 

\SM(P)-Som = \b{zJ)'Ab(zJ)-b(z,^)'Ab(z,P)\ 

^\ibizJ)-biz,P))'AibizJ)-biz,P)) 
+b{z, P)'A{b{z, /3) - b{z, /?)) + b{z, ^)'Ab{z, P) - b{z, P) - b{z, P)'Ab{z, /3) | 
^\{b{zJ)-b{z,P))'A{b{zJ)-b{z,l3)) 
+b{z, l3)'A{b{z, P) - b{z, P))'A{b{z, P) + b{z, P)'{A - A)b{z, fi) \ 
^mzJ)-b{z,P))'A{bizJ)-b{z,P)) 
+b{z, /?)' (i + A'){b{z, /3) - b{z, /?)) + b{z, /?)' (i - A)b{z, P) I 
Using triangle inequalities 

< \{b{zJ)~b{zJ)iA{b{z,^)-b{zJ))\ 
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Using Cauchy-Schwartz inequalities 

<||6(^,/3)-6(^,/3)in|i|| 
+2||6(^,/3)||||6(^,/3)-K^,/3)||||i|| + ||6(^,/3)in|i-^|| 

because 

b{zJ)-b{z,P)^0 

we have 

\S^{l3)-So{l3)\^0 

By Newey and McFadden (1994), follows by uniform convergence theorem, the con- 
clusion is obtained. 
B. Proof of Theorem 2. 

(1) Because 

dR,{l3L,PF)/df3 = d{b{z,/3)'Abiz,f3))/dl3 

= d{b{z, P)'/dpAbiz, P)) + dibiz, P)'/dpAb{z, P)) 

^2d{b{z,P)'/dpAb{z,l3)) 

where j3 = {I3l,I3f)\ for notational simplicity, below follow this notation, to GMM 
estimation, we solve first order condition, so obtain 

R,0)'Ab{zJ) = O {B.l) 

from (A.l), for optimal matrix Aq, we have 

Ri{(3)'Aob{z, = Ri{(3)'Ao^/Nb{z, (3) + o{b{z, (3)) {B.2) 

using Taylor series expansion around (3 

R,{/3)'Aob{z, /3) = R,{/3)'Ao{VNb{z, /3) + R,{/3)^0 - /3)) + o{b{z, /3)) 
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from (B.l),we have 

R,{(3)'AoRi{f3)VN0 - /3) = -Ri{(3)' AoVNb{z, f3) + o{b{z, /?)) 

so 

Vn0 -/3)^ -{RM'AoRim-'RM'AoVNb{z, /3) + o{h{z, /3)) 
by equation (23) as previous, we have 

\/Nh{z,(5) 4 Ar(0,L'i) 

and 

{R,{^)'AoR,m-^RM'Ao 

is a determined matrix, so 

y]V(^-/3)4 7V(0,Ei) 

i.e. 

V]V((^l,^f)-(^l,/3f))4 7V(0,Ei) 

Q. E. D. 

(2) The proof is similar with (1), with the same argument, we can prove it. 
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